AITopics | local detail

Collaborating Authors

local detail

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MR. Video: MapReduce as an Effective Principle for Long Video Understanding

Neural Information Processing SystemsJun-10-2026, 01:39:30 GMT

The fundamental challenge of long video understanding, e.g., question answering, lies in the extensive number of frames, making it infeasible to densely understand the local details while comprehensively digest the global contexts, especially within a limited context length. To address this problem, our insight is to process short video segments individually and combine these segment-level analyses into a final response. This intuition is noted in the well-established MapReduce principle in big data processing and is naturally compatible with inference scaling at the system level. Motivated by this, we propose MR. Video (pronounced as mister video), a long video understanding framework adopting the MapReduce principle. We define the standard operations of MapReduce in a long video understanding context: the Map steps conduct independent and sequence-parallel dense perception on short video segments, covering local details, while the Reduce steps comprehensively aggregate the segment-level results into an answer with global contexts.

artificial intelligence, proceedings, video agent, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.88)

Add feedback

ShapeCrafter: ARecursiveText-Conditioned 3DShapeGenerationModel

Neural Information Processing SystemsFeb-8-2026, 09:47:40 GMT

Existing methods that generate text-conditioned 3D shapes consume an entire text prompt to generate a 3D shape in a single step.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

GLoD: Composing Global Contexts and Local Details in Image Generation

Yamada, Moyuru

arXiv.org Artificial IntelligenceApr-23-2024

MultiDiffusion [Bar-Tal et al., 2023] places an object with specified details on a certain region using segmentation Diffusion models have demonstrated their capability masks and a prompt for each segment. These methods to synthesize high-quality and diverse images work without requiring any additional training; however, they from textual prompts. However, simultaneous control struggle to control both the global contexts (e.g., object interactions) over both global contexts (e.g., object layouts and the local details (e.g., object colors and emotions) and interactions) and local details (e.g., colors and simultaneously. With a complex prompt containing emotions) still remains a significant challenge. The multiple objects, the models often misinterpret specified local models often fail to understand complex descriptions details, directing them to the wrong target or ignoring them, involving multiple objects and reflect specified similar to the issues observed in Stable Diffusion [Rombach visual attributes to wrong targets or ignore et al., 2022]. While splitting the complex prompt into multiple them. This paper presents Global-Local Diffusion prompts allows the model to depict each object more (GLoD), a novel framework which allows simultaneous accurately, handling the prompts independently poses limitations control over the global contexts and the local in addressing a global context that describes interactions details in text-to-image generation without requiring and relationships between the multiple objects.

diffusion model, global context, local detail, (14 more...)

arXiv.org Artificial Intelligence

2404.15447

Country:

North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Add feedback

Magic-Boost: Boost 3D Generation with Mutli-View Conditioned Diffusion

Yang, Fan, Zhang, Jianfeng, Shi, Yichun, Chen, Bowen, Zhang, Chenxu, Zhang, Huichao, Yang, Xiaofeng, Feng, Jiashi, Lin, Guosheng

arXiv.org Artificial IntelligenceApr-9-2024

Benefiting from the rapid development of 2D diffusion models, 3D content creation has made significant progress recently. One promising solution involves the fine-tuning of pre-trained 2D diffusion models to harness their capacity for producing multi-view images, which are then lifted into accurate 3D models via methods like fast-NeRFs or large reconstruction models. However, as inconsistency still exists and limited generated resolution, the generation results of such methods still lack intricate textures and complex geometries. To solve this problem, we propose Magic-Boost, a multi-view conditioned diffusion model that significantly refines coarse generative results through a brief period of SDS optimization ($\sim15$min). Compared to the previous text or single image based diffusion models, Magic-Boost exhibits a robust capability to generate images with high consistency from pseudo synthesized multi-view images. It provides precise SDS guidance that well aligns with the identity of the input images, enriching the local detail in both geometry and texture of the initial generative results. Extensive experiments show Magic-Boost greatly enhances the coarse inputs and generates high-quality 3D assets with rich geometric and textural details. (Project Page: https://magic-research.github.io/magic-boost/)

arxiv preprint arxiv, diffusion model, multi-view image, (13 more...)

arXiv.org Artificial Intelligence

2404.06429

Country: Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report > Promising Solution (0.66)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Aesthetic Photo Collage with Deep Reinforcement Learning

Zhang, Mingrui, Li, Mading, Chen, Li, Yu, Jiahao

arXiv.org Artificial IntelligenceOct-19-2021

Photo collage aims to automatically arrange multiple photos on a given canvas with high aesthetic quality. Existing methods are based mainly on handcrafted feature optimization, which cannot adequately capture high-level human aesthetic senses. Deep learning provides a promising way, but owing to the complexity of collage and lack of training data, a solution has yet to be found. In this paper, we propose a novel pipeline for automatic generation of aspect ratio specified collage and the reinforcement learning technique is introduced in collage for the first time. Inspired by manual collages, we model the collage generation as sequential decision process to adjust spatial positions, orientation angles, placement order and the global layout. To instruct the agent to improve both the overall layout and local details, the reward function is specially designed for collage, considering subjective and objective factors. To overcome the lack of training data, we pretrain our deep aesthetic network on a large scale image aesthetic dataset (CPC) for general aesthetic feature extraction and propose an attention fusion module for structural collage feature representation. We test our model against competing methods on two movie datasets and our results outperform others in aesthetic quality evaluation. Further user study is also conducted to demonstrate the effectiveness.

collage, photo collage, representation, (15 more...)

arXiv.org Artificial Intelligence

2110.09775

Country: Asia (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Hierarchical Autoregressive Image Models with Auxiliary Decoders

De Fauw, Jeffrey, Dieleman, Sander, Simonyan, Karen

arXiv.org Machine LearningMar-6-2019

Autoregressive generative models of images tend to be biased towards capturing local structure, and as a result they often produce samples which are lacking in terms of large-scale coherence. To address this, we propose two methods to learn discrete representations of images which abstract away local detail. We show that autoregressive models conditioned on these representations can produce high-fidelity reconstructions of images, and that we can train autoregressive priors on these representations that produce samples with large-scale coherence. We can recursively apply the learning procedure, yielding a hierarchy of progressively more abstract image representations. We train hierarchical class-conditional autoregressive models on the ImageNet dataset and demonstrate that they are able to generate realistic images at resolutions of 128$\times$128 and 256$\times$256 pixels.

decoder, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

1903.04933

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback